On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

نویسندگان

Sashank J. Reddi

Ahmed Hefny

Suvrit Sra

Barnabás Póczos

Alexander J. Smola

چکیده

We study optimization algorithms based on variance reduction for stochastic gradient descent (SGD). Remarkable recent progress has been made in this direction through development of algorithms like SAG, SVRG, SAGA. These algorithms have been shown to outperform SGD, both theoretically and empirically. However, asynchronous versions of these algorithms—a crucial requirement for modern large-scale applications—have not been studied. We bridge this gap by presenting a unifying framework for many variance reduction techniques. Subsequently, we propose an asynchronous algorithm grounded in our framework, and prove its fast convergence. An important consequence of our general approach is that it yields asynchronous versions of variance reduction algorithms such as SVRG and SAGA as a byproduct. Our method achieves near linear speedup in sparse settings common to machine learning. We demonstrate the empirical performance of our method through a concrete realization of asynchronous SVRG.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...

متن کامل

Asynchronous Distributed Semi-Stochastic Gradient Optimization

With the recent proliferation of large-scale learning problems, there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants. However, existing algorithms either suffer from slow convergence due to the inherent variance of stochastic gradients, or have a fast linear convergence rate but at t...

متن کامل

IS-ASGD: Importance Sampling Accelerated Asynchronous SGD on Multi-Core Systems

Variance reduction (VR) algorithms for convergence acceleration of stochastic gradient descent (SGD) have been developed with great efforts recently. Its two variants, stochastic variance-reduced-gradient (SVRG) and importance sampling (IS) have achieved impressive progresses. Meanwhile, asynchronous SGD (ASGD) is becoming more important due to the ever-increasing scale of optimization problems...

متن کامل

Asynchronous Accelerated Stochastic Gradient Descent

Stochastic gradient descent (SGD) is a widely used optimization algorithm in machine learning. In order to accelerate the convergence of SGD, a few advanced techniques have been developed in recent years, including variance reduction, stochastic coordinate sampling, and Nesterov’s acceleration method. Furthermore, in order to improve the training speed and/or leverage larger-scale training data...

متن کامل

Variance Reduction for Distributed Stochastic Gradient Descent

Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or an exact gradient computation (using the entire dataset) at the end of each epoch. This limits the use of VR methods in practical dis...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants

نویسندگان

چکیده

منابع مشابه

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

Asynchronous Distributed Semi-Stochastic Gradient Optimization

IS-ASGD: Importance Sampling Accelerated Asynchronous SGD on Multi-Core Systems

Asynchronous Accelerated Stochastic Gradient Descent

Variance Reduction for Distributed Stochastic Gradient Descent

عنوان ژورنال:

اشتراک گذاری